Skip to content

feat(ci): add ephemeral Brev E2E test infrastructure#813

Merged
cv merged 1 commit intoNVIDIA:mainfrom
jyaunches:feat/ephemeral-brev-e2e
Mar 24, 2026
Merged

feat(ci): add ephemeral Brev E2E test infrastructure#813
cv merged 1 commit intoNVIDIA:mainfrom
jyaunches:feat/ephemeral-brev-e2e

Conversation

@jyaunches
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches commented Mar 24, 2026

How to use

After merge, any maintainer can trigger a full-stack E2E test against any branch:

  1. Go to Actions → "e2e-brev" → "Run workflow"
  2. Enter the branch name, optionally a PR number, pick a test suite, and leave keep_alive as true
  3. Tests run on a fresh Brev CPU instance (~15-20 min)
  4. A comment is posted on the PR with results and SSH access:

Brev E2E (full): PASSED on branch feat/my-prSee logs

Instance e2e-pr-813 is still running. To SSH in:

brev refresh && ssh e2e-pr-813

When done, delete it: brev delete e2e-pr-813

Set keep_alive to false to auto-delete the instance after tests complete.

SSH access setup

To SSH into a kept-alive instance, you need to be in the NemoClaw Brev org:

  1. Join the org: NemoClaw Brev Org Invite
  2. Install Brev CLI: brew install brev (or see docs)
  3. Login: brev login
  4. Connect: brev refresh && ssh <instance-name>

Summary

  • Vitest-based E2E test suite that provisions an ephemeral Brev CPU instance, bootstraps NemoClaw (Node.js, Docker, OpenShell, gateway, sandbox), runs tests remotely, and optionally keeps the instance alive for SSH debugging
  • GitHub Actions workflow (e2e-brev) triggered via workflow_dispatch or workflow_call
  • Results reported to PRs via check runs and comments with SSH instructions

Motivation

Security-critical PRs (e.g., #156) need testing against the real OpenShell + sandbox stack, which the existing Docker sandbox E2E and nightly cloud E2E cannot provide.

Files changed

File Change
test/e2e/brev-e2e.test.js New — vitest E2E test: create instance, bootstrap, run tests, cleanup
.github/workflows/e2e-brev.yaml New — workflow with dispatch inputs, PR reporting, keep_alive option
vitest.config.ts Modified — added e2e-brev project, excluded E2E from CLI test project
test/e2e/test-full-e2e.sh Modified — pass --yes to destroy, shfmt formatting
.gitignore Modified — added specs/

Required secrets (must be configured before first run)

BREV_API_TOKEN

# 1. Login interactively (opens browser)
brev login

# 2. Extract the refresh token
python3 -c "import json; print(json.load(open('$HOME/.brev/credentials.json'))['refresh_token'])"

# 3. Add to GitHub secrets
gh secret set BREV_API_TOKEN --repo NVIDIA/NemoClaw --body "<token>"

Note: Brev refresh tokens expire in ~24 hours. The secret needs periodic refresh.

NVIDIA_API_KEY

Already exists. Must start with nvapi-.

Validated end-to-end

All 6 phases passed on an ephemeral Brev instance triggered from GitHub Actions (fork run):

Phase Result
Phase 0: Pre-cleanup
Phase 1: Prerequisites (Docker, API key, network)
Phase 2: Install (install.sh, nemoclaw, openshell)
Phase 3: Sandbox verification (list, status, inference, policies)
Phase 4: Live inference (direct API + sandbox-to-API)
Phase 5: CLI operations (logs)
Phase 6: Cleanup (destroy sandbox)
Instance teardown (brev delete)

Known limitations

  • Brev refresh tokens expire (~24 hours)BREV_API_TOKEN needs periodic refresh
  • vLLM installs unnecessarily on CPU boxesbrev-setup.sh wastes ~3 min (separate fix)
  • Brev CLI pinned to v0.6.310 — v0.6.322 removed --cpu flag

Next steps (after this PR merges)

  1. Long-lived Brev auth token — refresh token expires daily. Investigate service account tokens with longer TTL.
  2. /test-brev PR comment triggerissue_comment workflow so maintainers can type /test-brev on a PR instead of using the Actions UI. Requires the workflow file on main first.
  3. Credential sanitization test suitetest/e2e/test-credential-sanitization.sh exercising real createSnapshotBundle() from migration-state.ts. The dropdown option is wired but no script exists yet.
  4. Nightly integration — add job in nightly-e2e.yaml calling e2e-brev.yaml via workflow_call.
  5. Skip vLLM on CPU boxes — fix brev-setup.sh GPU detection.

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Brev-based end-to-end testing flow: a new GitHub Actions workflow to provision ephemeral Brev instances, a Vitest E2E project that rsyncs and runs remote tests over SSH with injected secrets, an accompanying Vitest test file, minor shell test script tweaks, and a gitignore update.

Changes

Cohort / File(s) Summary
Ignore Patterns
\.gitignore
Added specs/ to ignore patterns.
CI Workflow
.github/workflows/e2e-brev.yaml
New e2e-brev workflow with workflow_dispatch and workflow_call inputs, required Brev/NVIDIA secrets for callable use, concurrency control, creates/updates PR check-runs, runs Node install, installs pinned Brev CLI, runs Vitest e2e-brev project, posts PR comments and uploads logs on failure.
E2E Tests
test/e2e/brev-e2e.test.js, test/e2e/test-full-e2e.sh
Added Vitest suite that creates/destroys Brev instances, waits for SSH, rsyncs repo, boots remote setup, injects secrets via exported env over SSH, runs conditional test suites (full, credential-sanitization); updated shell script to use nemoclaw "$SANDBOX_NAME" destroy --yes and reformatted conditional pipelines.
Vitest Config
vitest.config.ts
Excluded test/e2e/** from the existing CLI project and added a new e2e-brev Vitest project targeting test/e2e/brev-e2e.test.js.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant Brev as Brev CLI
    participant Instance as Brev Instance
    participant SSH as SSH
    participant Suite as Vitest Suite

    GHA->>Brev: create instance (CPU, detached)
    Brev-->>Instance: provision instance
    GHA->>Instance: wait for IP / readiness
    GHA->>SSH: poll SSH ("ok") with retries & brev refresh
    SSH-->>GHA: SSH reachable
    GHA->>SSH: rsync repo (excludes)
    SSH-->>Instance: transfer files
    GHA->>SSH: run remote bootstrap with exported secrets
    SSH-->>Instance: execute setup script
    GHA->>Suite: invoke Vitest remotely
    Suite->>Instance: run remote test scripts
    Instance-->>Suite: return test results (PASS/FAIL)
    GHA->>Brev: destroy instance
    Brev-->>Instance: terminate
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~38 minutes

Poem

🐰 I hop where Brev clouds briefly glow,

I rsync code and watch tests flow,
Secrets tucked in gentle hops,
SSH sings — the pipeline stops,
Then cleanup nibbles, soft and slow.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding ephemeral Brev E2E test infrastructure as a CI feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
.github/workflows/hardware-e2e.yaml (1)

50-52: Static concurrency group may cancel unrelated PR runs.

The group hardware-e2e is shared across all workflow runs. If PR #1 triggers a run and PR #2 triggers another, the first gets cancelled. This may be intentional to limit Brev resource usage, but could cause frustration in high-traffic scenarios.

Consider including context in the group name if parallel runs for different PRs are desired:

♻️ Optional: Isolate concurrency by PR
 concurrency:
-  group: hardware-e2e
+  group: hardware-e2e-${{ inputs.pr_number || github.run_id }}
   cancel-in-progress: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/hardware-e2e.yaml around lines 50 - 52, The concurrency
group "hardware-e2e" is static and will cancel unrelated PR runs; change the
concurrency.group value to include a run-specific identifier (e.g., PR number or
ref) so runs are isolated per PR or branch. Update the concurrency.group (the
key shown as "group: hardware-e2e") to an expression like hardware-e2e-${{
github.event.pull_request.number || github.ref_name }} (or use github.run_id for
full uniqueness) so each PR/branch gets its own concurrency group and unrelated
runs won't cancel each other.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/hardware-e2e.yaml:
- Around line 35-38: The input declaration for test_suite currently has both
required: true and a default ("full") which makes the default unused; fix by
deciding one behavior: either remove required: true so the workflow_call input
test_suite can be omitted and the default "full" will apply, or remove the
default value and keep required: true so callers must always supply test_suite;
update the test_suite input block accordingly (the input named test_suite in the
workflow_call inputs).

In `@scripts/e2e-brev-test.sh`:
- Around line 111-123: The run_test function currently interpolates
NVIDIA_API_KEY directly into the remote command string (in run_test), exposing
the secret; change the implementation to stop embedding ${NVIDIA_API_KEY} in the
SSH command and instead transmit it securely to the remote side (for example,
send the key over SSH stdin and set it on the remote shell before running the
test, or enable SSH env forwarding with SendEnv/AcceptEnv and use NVIDIA_API_KEY
from the remote environment). Update the ssh invocation in run_test so the
remote shell receives and exports NVIDIA_API_KEY without it appearing in the
logged command string, while preserving the rest of the behavior (exports for
npm_config_prefix, PATH, NEMOCLAW_* and the tee/PIPESTATUS handling).
- Around line 100-105: The SSH invocation in scripts/e2e-brev-test.sh currently
interpolates NVIDIA_API_KEY and GITHUB_TOKEN into the remote command string (the
ssh call that cd's into $REMOTE_DIR and runs bash scripts/brev-setup.sh),
exposing secrets in process listings; change this to pass secrets via stdin or a
remote temp file instead of embedding them in the command: send a here-doc or
pipe that sets/exports NVIDIA_API_KEY and GITHUB_TOKEN (and any NEMOCLAW_* vars)
and then invokes scripts/brev-setup.sh on the remote, or alternatively write
them to a secured temp file on the remote and source it before running
brev-setup.sh; ensure you remove the temp file afterward and avoid interpolating
the secret variables directly into the ssh command string.

---

Nitpick comments:
In @.github/workflows/hardware-e2e.yaml:
- Around line 50-52: The concurrency group "hardware-e2e" is static and will
cancel unrelated PR runs; change the concurrency.group value to include a
run-specific identifier (e.g., PR number or ref) so runs are isolated per PR or
branch. Update the concurrency.group (the key shown as "group: hardware-e2e") to
an expression like hardware-e2e-${{ github.event.pull_request.number ||
github.ref_name }} (or use github.run_id for full uniqueness) so each PR/branch
gets its own concurrency group and unrelated runs won't cancel each other.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1f81ef1a-cc14-4dd4-9dcf-4f6a3544ccbd

📥 Commits

Reviewing files that changed from the base of the PR and between fb03994 and 8cfc874.

📒 Files selected for processing (3)
  • .github/workflows/hardware-e2e.yaml
  • .gitignore
  • scripts/e2e-brev-test.sh

Comment on lines +100 to +105
ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \
export GITHUB_TOKEN='${GITHUB_TOKEN}' && \
export NEMOCLAW_NON_INTERACTIVE=1 && \
export NEMOCLAW_SANDBOX_NAME=e2e-test && \
bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Secrets embedded in SSH command string are visible in process listings.

The NVIDIA_API_KEY and GITHUB_TOKEN values are interpolated into the SSH command line. This exposes them in:

  1. Local process listings (ps aux) during execution
  2. Potentially shell history or debug logs on the remote

Consider passing secrets more securely, for example via stdin:

🔒 Proposed fix to avoid secrets in command line
-ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
-  export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \
-  export GITHUB_TOKEN='${GITHUB_TOKEN}' && \
-  export NEMOCLAW_NON_INTERACTIVE=1 && \
-  export NEMOCLAW_SANDBOX_NAME=e2e-test && \
-  bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE"
+ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
+  export NVIDIA_API_KEY=\"\$(cat)\" && \
+  export GITHUB_TOKEN='${GITHUB_TOKEN}' && \
+  export NEMOCLAW_NON_INTERACTIVE=1 && \
+  export NEMOCLAW_SANDBOX_NAME=e2e-test && \
+  bash scripts/brev-setup.sh" <<< "$NVIDIA_API_KEY" 2>&1 | tee -a "$LOG_FILE"

Alternatively, write secrets to a temporary file on the remote and source it, or use ssh -o SendEnv= if the remote sshd is configured to accept environment variables.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \
export GITHUB_TOKEN='${GITHUB_TOKEN}' && \
export NEMOCLAW_NON_INTERACTIVE=1 && \
export NEMOCLAW_SANDBOX_NAME=e2e-test && \
bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE"
ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
export NVIDIA_API_KEY=\"\$(cat)\" && \
export GITHUB_TOKEN='${GITHUB_TOKEN}' && \
export NEMOCLAW_NON_INTERACTIVE=1 && \
export NEMOCLAW_SANDBOX_NAME=e2e-test && \
bash scripts/brev-setup.sh" <<< "$NVIDIA_API_KEY" 2>&1 | tee -a "$LOG_FILE"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/e2e-brev-test.sh` around lines 100 - 105, The SSH invocation in
scripts/e2e-brev-test.sh currently interpolates NVIDIA_API_KEY and GITHUB_TOKEN
into the remote command string (the ssh call that cd's into $REMOTE_DIR and runs
bash scripts/brev-setup.sh), exposing secrets in process listings; change this
to pass secrets via stdin or a remote temp file instead of embedding them in the
command: send a here-doc or pipe that sets/exports NVIDIA_API_KEY and
GITHUB_TOKEN (and any NEMOCLAW_* vars) and then invokes scripts/brev-setup.sh on
the remote, or alternatively write them to a secured temp file on the remote and
source it before running brev-setup.sh; ensure you remove the temp file
afterward and avoid interpolating the secret variables directly into the ssh
command string.

Comment on lines +111 to +123
run_test() {
local test_script="$1"
local test_name="$2"
info "Running $test_name..."
ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
export npm_config_prefix=\$HOME/.local && \
export PATH=\$HOME/.local/bin:\$PATH && \
export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \
export NEMOCLAW_NON_INTERACTIVE=1 && \
export NEMOCLAW_SANDBOX_NAME=e2e-test && \
bash $test_script" 2>&1 | tee -a "$LOG_FILE"
return "${PIPESTATUS[0]}"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same secret exposure issue in run_test function.

Line 118 embeds NVIDIA_API_KEY in the command string, same concern as the bootstrap step. Apply the same mitigation strategy here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/e2e-brev-test.sh` around lines 111 - 123, The run_test function
currently interpolates NVIDIA_API_KEY directly into the remote command string
(in run_test), exposing the secret; change the implementation to stop embedding
${NVIDIA_API_KEY} in the SSH command and instead transmit it securely to the
remote side (for example, send the key over SSH stdin and set it on the remote
shell before running the test, or enable SSH env forwarding with
SendEnv/AcceptEnv and use NVIDIA_API_KEY from the remote environment). Update
the ssh invocation in run_test so the remote shell receives and exports
NVIDIA_API_KEY without it appearing in the logged command string, while
preserving the rest of the behavior (exports for npm_config_prefix, PATH,
NEMOCLAW_* and the tee/PIPESTATUS handling).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why build a test runner from scratch? We should write this as vitest and get assertions, reporting, async behavior etc for free.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point. putting that in

@jyaunches jyaunches force-pushed the feat/ephemeral-brev-e2e branch from 8aede10 to 2f9ba13 Compare March 24, 2026 16:16
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
scripts/e2e-brev-test.sh (1)

103-108: ⚠️ Potential issue | 🟠 Major

Stop embedding secrets in remote command strings (still exposed).

NVIDIA_API_KEY/GITHUB_TOKEN are still interpolated into SSH command arguments. Pass them via stdin (or remote temp file) instead.

🔒 Proposed fix pattern for both SSH calls
 ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
-  export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \
-  export GITHUB_TOKEN='${GITHUB_TOKEN}' && \
-  export NEMOCLAW_NON_INTERACTIVE=1 && \
-  export NEMOCLAW_SANDBOX_NAME=e2e-test && \
-  bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE"
+  :"  # keep structure for context
+
+printf '%s\n%s\n' "$NVIDIA_API_KEY" "$GITHUB_TOKEN" | \
+ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd '$REMOTE_DIR' && \
+  IFS= read -r NVIDIA_API_KEY && \
+  IFS= read -r GITHUB_TOKEN && \
+  export NVIDIA_API_KEY GITHUB_TOKEN && \
+  export NEMOCLAW_NON_INTERACTIVE=1 && \
+  export NEMOCLAW_SANDBOX_NAME=e2e-test && \
+  bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE"
@@
   ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
     export npm_config_prefix=\$HOME/.local && \
     export PATH=\$HOME/.local/bin:\$PATH && \
-    export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \
-    export NEMOCLAW_NON_INTERACTIVE=1 && \
-    export NEMOCLAW_SANDBOX_NAME=e2e-test && \
-    bash $test_script" 2>&1 | tee -a "$LOG_FILE"
+    :"  # keep structure for context
+  printf '%s\n' "$NVIDIA_API_KEY" | \
+  ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd '$REMOTE_DIR' && \
+    IFS= read -r NVIDIA_API_KEY && \
+    export npm_config_prefix=\$HOME/.local && \
+    export PATH=\$HOME/.local/bin:\$PATH && \
+    export NVIDIA_API_KEY && \
+    export NEMOCLAW_NON_INTERACTIVE=1 && \
+    export NEMOCLAW_SANDBOX_NAME=e2e-test && \
+    bash '$test_script'" 2>&1 | tee -a "$LOG_FILE"

Also applies to: 118-124

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/e2e-brev-test.sh` around lines 103 - 108, The SSH invocation in the
e2e-brev-test.sh script currently interpolates secrets (NVIDIA_API_KEY and
GITHUB_TOKEN) into the remote command string; change the ssh call that runs "cd
$REMOTE_DIR && ... bash scripts/brev-setup.sh" so secrets are not embedded: send
NVIDIA_API_KEY and GITHUB_TOKEN via stdin or write them to a secure remote temp
file before invoking bash, then source/read them in the remote environment (keep
exports for NEMOCLAW_* as-is), and ensure the remote temp file is removed;
update both occurrences of the ssh command (the block using INSTANCE_NAME,
REMOTE_DIR, and LOG_FILE around lines shown and the second SSH at 118-124) so no
secret values are interpolated into the command string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/e2e-brev-test.sh`:
- Around line 46-49: The cleanup() function unconditionally calls brev delete
"$INSTANCE_NAME", which can remove a pre-existing instance if creation/login
failed; add a boolean flag (e.g., INSTANCE_CREATED or CREATED_BY_RUN) that is
set to true only after successful creation in the create/login flow, then modify
cleanup() to only call brev delete when that flag is true; apply the same guard
to the other deletion call referenced around lines 67-68 so deletions only occur
for instances created by this run and not for pre-existing resources.

---

Duplicate comments:
In `@scripts/e2e-brev-test.sh`:
- Around line 103-108: The SSH invocation in the e2e-brev-test.sh script
currently interpolates secrets (NVIDIA_API_KEY and GITHUB_TOKEN) into the remote
command string; change the ssh call that runs "cd $REMOTE_DIR && ... bash
scripts/brev-setup.sh" so secrets are not embedded: send NVIDIA_API_KEY and
GITHUB_TOKEN via stdin or write them to a secure remote temp file before
invoking bash, then source/read them in the remote environment (keep exports for
NEMOCLAW_* as-is), and ensure the remote temp file is removed; update both
occurrences of the ssh command (the block using INSTANCE_NAME, REMOTE_DIR, and
LOG_FILE around lines shown and the second SSH at 118-124) so no secret values
are interpolated into the command string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3b9d7093-3ee9-40bf-8f2b-c129a308733f

📥 Commits

Reviewing files that changed from the base of the PR and between 8aede10 and 2f9ba13.

📒 Files selected for processing (2)
  • scripts/e2e-brev-test.sh
  • test/e2e/test-full-e2e.sh
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/test-full-e2e.sh

Comment on lines +46 to +49
cleanup() {
info "Cleaning up: deleting instance $INSTANCE_NAME..."
brev delete "$INSTANCE_NAME" 2>/dev/null || warn "Failed to delete instance (may not exist)"
info "Log saved to $LOG_FILE"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard cleanup deletion with a “created-by-this-run” flag.

cleanup() always calls brev delete "$INSTANCE_NAME" even if creation never succeeded. A failed login/create path can delete a pre-existing instance with the same name.

🔧 Proposed fix
 REMOTE_EXIT_CODE=0
+INSTANCE_CREATED=0
 
 # ── Cleanup trap — always delete the instance ────────────────────
 cleanup() {
-  info "Cleaning up: deleting instance $INSTANCE_NAME..."
-  brev delete "$INSTANCE_NAME" 2>/dev/null || warn "Failed to delete instance (may not exist)"
+  if [ "$INSTANCE_CREATED" -eq 1 ]; then
+    info "Cleaning up: deleting instance $INSTANCE_NAME..."
+    brev delete "$INSTANCE_NAME" 2>/dev/null || warn "Failed to delete instance (may not exist)"
+  else
+    info "Cleanup: no instance created by this run; skipping delete"
+  fi
   info "Log saved to $LOG_FILE"
 }
 trap cleanup EXIT
@@
 brev create "$INSTANCE_NAME" --cpu "$BREV_CPU" --detached 2>&1 || fail "Failed to create instance"
+INSTANCE_CREATED=1
 info "Instance creation started"

Also applies to: 67-68

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/e2e-brev-test.sh` around lines 46 - 49, The cleanup() function
unconditionally calls brev delete "$INSTANCE_NAME", which can remove a
pre-existing instance if creation/login failed; add a boolean flag (e.g.,
INSTANCE_CREATED or CREATED_BY_RUN) that is set to true only after successful
creation in the create/login flow, then modify cleanup() to only call brev
delete when that flag is true; apply the same guard to the other deletion call
referenced around lines 67-68 so deletions only occur for instances created by
this run and not for pre-existing resources.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
test/e2e/brev-e2e.test.js (2)

74-87: Minor: Consider using native Node.js sleep instead of spawning a shell.

Using execSync("sleep ...") works but spawns a shell process for each iteration. A native approach would be more idiomatic, though it would require making waitForSsh async.

♻️ Alternative using native setTimeout
-function waitForSsh(maxAttempts = 60, intervalMs = 5_000) {
+async function waitForSsh(maxAttempts = 60, intervalMs = 5_000) {
+  const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
   for (let i = 1; i <= maxAttempts; i++) {
     try {
       ssh("echo ok", { timeout: 10_000 });
       return;
     } catch {
       if (i === maxAttempts) throw new Error(`SSH not ready after ${maxAttempts} attempts`);
       if (i % 5 === 0) {
         try { brev("refresh"); } catch { /* ignore */ }
       }
-      execSync(`sleep ${intervalMs / 1000}`);
+      await sleep(intervalMs);
     }
   }
 }

Note: This would require beforeAll to be async and call await waitForSsh().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/brev-e2e.test.js` around lines 74 - 87, The current waitForSsh
function blocks by spawning a shell with execSync("sleep ..."); refactor
waitForSsh to be async (function waitForSsh(...)) and replace execSync sleep
with a non-blocking await (e.g., await new Promise(resolve =>
setTimeout(resolve, intervalMs))); keep the retry logic, timeouts for ssh("echo
ok", { timeout: 10_000 }) and the brev("refresh") calls but await where needed,
and update any callers (e.g., beforeAll) to be async and call await
waitForSsh(...).

137-143: Instance cleanup may not run on forcible termination.

If the test process receives SIGKILL (e.g., CI timeout, OOM), afterAll won't execute and the Brev instance will be orphaned. Consider documenting this limitation or adding external cleanup (e.g., a scheduled job to delete stale e2e-pr-* instances).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/brev-e2e.test.js` around lines 137 - 143, The afterAll cleanup (in
test/e2e/brev-e2e.test.js) that calls brev("delete", INSTANCE_NAME) won't run on
forcible termination (SIGKILL), so add mitigation: either update the repository
docs/README or e2e test README to document this limitation and advise running an
external scheduled cleanup, or add an auxiliary maintenance job (e.g., a CI
scheduled workflow or cron job) that deletes stale instances matching the e2e
prefix (e.g., e2e-pr-*) by invoking the same brev delete logic; ensure the
instructions reference the afterAll cleanup and the brev("delete",
INSTANCE_NAME) behavior so maintainers know what external cleanup must
replicate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/e2e-brev.yaml:
- Around line 128-134: The workflow's "Upload test logs" step is trying to
upload /tmp/brev-e2e-*.log from the runner but the logs live on the remote Brev
VM; add a prior step (e.g., "Download remote logs" or within the SSH teardown
flow that runs before the "Upload test logs" step) that copies the remote log
files (e.g., /tmp/nemoclaw-*-e2e-*.log) from the VM to the runner workspace (for
example via scp or an ssh command that streams files into $GITHUB_WORKSPACE or
/tmp on the runner), then update the Upload test logs step to point to the
downloaded location so artifacts are actually uploaded. Ensure the download step
runs regardless of test failure so logs are available to upload.

In `@test/e2e/brev-e2e.test.js`:
- Around line 54-60: The secret exports in sshWithSecrets build secretPreamble
using raw process.env values (NVIDIA_API_KEY, GITHUB_TOKEN) which breaks the
shell if they contain single quotes; update sshWithSecrets to sanitize/escape
single quotes in those env values before joining (e.g., replace each ' with the
shell-safe sequence '\''), and use the escaped values when constructing
secretPreamble (apply the same escaping to any other exported secret like
NEMOCLAW_* if needed) so the resulting export lines are safe to interpolate into
the remote shell command.

---

Nitpick comments:
In `@test/e2e/brev-e2e.test.js`:
- Around line 74-87: The current waitForSsh function blocks by spawning a shell
with execSync("sleep ..."); refactor waitForSsh to be async (function
waitForSsh(...)) and replace execSync sleep with a non-blocking await (e.g.,
await new Promise(resolve => setTimeout(resolve, intervalMs))); keep the retry
logic, timeouts for ssh("echo ok", { timeout: 10_000 }) and the brev("refresh")
calls but await where needed, and update any callers (e.g., beforeAll) to be
async and call await waitForSsh(...).
- Around line 137-143: The afterAll cleanup (in test/e2e/brev-e2e.test.js) that
calls brev("delete", INSTANCE_NAME) won't run on forcible termination (SIGKILL),
so add mitigation: either update the repository docs/README or e2e test README
to document this limitation and advise running an external scheduled cleanup, or
add an auxiliary maintenance job (e.g., a CI scheduled workflow or cron job)
that deletes stale instances matching the e2e prefix (e.g., e2e-pr-*) by
invoking the same brev delete logic; ensure the instructions reference the
afterAll cleanup and the brev("delete", INSTANCE_NAME) behavior so maintainers
know what external cleanup must replicate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 98e4d405-4e6e-4fdf-b20e-fe3a8fb1a861

📥 Commits

Reviewing files that changed from the base of the PR and between 2f9ba13 and 1fb8d7d.

📒 Files selected for processing (3)
  • .github/workflows/e2e-brev.yaml
  • test/e2e/brev-e2e.test.js
  • vitest.config.ts

Comment on lines +128 to +134
- name: Upload test logs
if: failure()
uses: actions/upload-artifact@v4
with:
name: e2e-brev-logs
path: /tmp/brev-e2e-*.log
if-no-files-found: ignore
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if any script creates brev-e2e logs
rg -n "brev-e2e.*\.log" --type sh --type bash
rg -n "/tmp/brev-e2e"

Repository: NVIDIA/NemoClaw

Length of output: 87


🏁 Script executed:

#!/bin/bash
# Find the brev-e2e test file
fd -e js -e ts | grep -i "brev-e2e"

Repository: NVIDIA/NemoClaw

Length of output: 84


🏁 Script executed:

#!/bin/bash
# Search for patterns that create brev-e2e logs more broadly
rg "brev-e2e.*\.log" -i
rg "/tmp/brev-e2e" -i

Repository: NVIDIA/NemoClaw

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Check the e2e-brev.yaml workflow file content
cat -n .github/workflows/e2e-brev.yaml | head -150

Repository: NVIDIA/NemoClaw

Length of output: 5374


🏁 Script executed:

#!/bin/bash
# Read the test file to see if it creates log files
wc -l test/e2e/brev-e2e.test.js

Repository: NVIDIA/NemoClaw

Length of output: 88


🏁 Script executed:

#!/bin/bash
# Read first part of the test file
head -100 test/e2e/brev-e2e.test.js

Repository: NVIDIA/NemoClaw

Length of output: 3346


🏁 Script executed:

#!/bin/bash
# Check if there's a vitest config file
fd -e config.ts -e config.js -e vitest.config.ts -e vitest.config.js

Repository: NVIDIA/NemoClaw

Length of output: 96


🏁 Script executed:

#!/bin/bash
# Read the vitest config file
cat vitest.config.ts

Repository: NVIDIA/NemoClaw

Length of output: 904


🏁 Script executed:

#!/bin/bash
# Read the rest of the test file
tail -65 test/e2e/brev-e2e.test.js

Repository: NVIDIA/NemoClaw

Length of output: 2173


🏁 Script executed:

#!/bin/bash
# Check if remote test scripts reference log files
rg "\.log" test/e2e/

Repository: NVIDIA/NemoClaw

Length of output: 128


🏁 Script executed:

#!/bin/bash
# List all files in test/e2e directory
ls -la test/e2e/

Repository: NVIDIA/NemoClaw

Length of output: 449


Upload path won't find any files since logs are created on the remote VM, not the runner.

The test runs on an ephemeral Brev instance via SSH and creates logs on the remote VM (e.g., /tmp/nemoclaw-e2e-install.log), but the workflow step attempts to upload from the runner's /tmp/brev-e2e-*.log. The if-no-files-found: ignore prevents failure, but no logs are uploaded when tests fail, making debugging harder.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e-brev.yaml around lines 128 - 134, The workflow's
"Upload test logs" step is trying to upload /tmp/brev-e2e-*.log from the runner
but the logs live on the remote Brev VM; add a prior step (e.g., "Download
remote logs" or within the SSH teardown flow that runs before the "Upload test
logs" step) that copies the remote log files (e.g., /tmp/nemoclaw-*-e2e-*.log)
from the VM to the runner workspace (for example via scp or an ssh command that
streams files into $GITHUB_WORKSPACE or /tmp on the runner), then update the
Upload test logs step to point to the downloaded location so artifacts are
actually uploaded. Ensure the download step runs regardless of test failure so
logs are available to upload.

@jyaunches jyaunches force-pushed the feat/ephemeral-brev-e2e branch from 1fb8d7d to 89b90da Compare March 24, 2026 20:09
@jyaunches
Copy link
Copy Markdown
Contributor Author

Addressing CodeRabbit feedback on secret exposure (#2982050976, #2982050993):

scripts/e2e-brev-test.sh has been replaced with a vitest suite (test/e2e/brev-e2e.test.js). Secrets are now passed via execSync's input option (piped through stdin), so they never appear in the command string or process listings.

// Secrets piped via stdin, not interpolated into the command
return execSync(
  `ssh ... '${INSTANCE_NAME}' 'eval "$(cat)" && ${cmd}'`,
  { input: secretPreamble, ... },
);

@jyaunches jyaunches force-pushed the feat/ephemeral-brev-e2e branch from 89b90da to 3b7c530 Compare March 24, 2026 20:10
@wscurran wscurran added CI/CD Use this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions. enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. Platform: Brev Support for Brev deployment labels Mar 24, 2026
@jyaunches jyaunches force-pushed the feat/ephemeral-brev-e2e branch 2 times, most recently from c6f92d5 to 5f502d6 Compare March 24, 2026 22:34

- name: Install Brev CLI
run: |
# Pin to v0.6.310 — v0.6.322 removed --cpu flag and defaults to GPU instances
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--cpu flag was deprecated. The new way to deploy a CPU is as follows brev create --type n2-standard-4 test-cpu

Copy link
Copy Markdown
Contributor

@cv cv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good — secret handling via stdin is solid, vitest migration is clean, CI is green. The Brev --cpu deprecation is worth a follow-up but not a blocker.

Add a vitest-based E2E test suite that provisions an ephemeral Brev
CPU instance, bootstraps it with NemoClaw (Node.js, Docker, OpenShell,
gateway, sandbox), runs tests remotely, and optionally keeps the
instance alive for SSH debugging.

New files:
- test/e2e/brev-e2e.test.js: vitest suite — create, bootstrap, test, cleanup
- .github/workflows/e2e-brev.yaml: workflow with dispatch inputs, PR
  reporting, and keep_alive option for SSH debugging

Changes:
- vitest.config.ts: added e2e-brev project, excluded from default runs
- test/e2e/test-full-e2e.sh: pass --yes to destroy, shfmt formatting
- .gitignore: added specs/

Key design decisions:
- Ephemeral instances — no persistent state, no idle cost
- CPU-only (--cpu 4x16) — OpenShell gateway works without GPU
- Brev CLI pinned to v0.6.310 (v0.6.322 removed --cpu flag)
- Secrets passed via stdin to avoid exposure in process listings
- keep_alive defaults to true — instance stays up for SSH debugging
- describe.runIf skips tests when secrets unavailable (normal CI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jyaunches jyaunches force-pushed the feat/ephemeral-brev-e2e branch from d6b3058 to 77f46f2 Compare March 24, 2026 23:13
@cv cv merged commit a354c5a into NVIDIA:main Mar 24, 2026
5 checks passed
jyaunches added a commit to jyaunches/NemoClaw that referenced this pull request Mar 25, 2026
…nitization

Adds two new Brev E2E test suites targeting the vulnerabilities fixed by
PR NVIDIA#119 (Telegram bridge command injection) and PR NVIDIA#156 (credential
exposure in migration snapshots + blueprint digest bypass).

Test suites:
- test-telegram-injection.sh: 8 tests covering command substitution,
  backtick injection, quote-breakout, parameter expansion, process
  table leaks, and SANDBOX_NAME validation
- test-credential-sanitization.sh: 13 tests covering auth-profiles.json
  deletion, credential field stripping, non-credential preservation,
  symlink safety, blueprint digest verification, and pattern-based
  field detection

These tests are expected to FAIL on main (unfixed code) and PASS
once PR NVIDIA#119 and NVIDIA#156 are merged.

Refs: NVIDIA#118, NVIDIA#119, NVIDIA#156, NVIDIA#813
jyaunches added a commit to jyaunches/NemoClaw that referenced this pull request Mar 25, 2026
Adds specification for a warm-pool model that eliminates the 40+ min
cold-start Docker build time from Brev E2E test runs.

Key design:
- Pre-warmed Brev instances using NemoClaw launchable
- Naming convention (e2e-warm-*) as state — no external tracking
- PR comment trigger (/test-brev <suites>) for maintainers
- Parallel test execution across pool instances
- Daily health check + 24h instance cycling
- Fallback to ephemeral mode if pool is empty

6 phases: pool script → warmer workflow → test runner →
PR trigger → run security tests → cleanup

Also includes the original NVIDIA#813 hardware E2E spec for reference.

Refs: NVIDIA#813, NVIDIA#118, NVIDIA#119, NVIDIA#156
temrjan pushed a commit to temrjan/NemoClaw that referenced this pull request Mar 25, 2026
Add a vitest-based E2E test suite that provisions an ephemeral Brev
CPU instance, bootstraps it with NemoClaw (Node.js, Docker, OpenShell,
gateway, sandbox), runs tests remotely, and optionally keeps the
instance alive for SSH debugging.

New files:
- test/e2e/brev-e2e.test.js: vitest suite — create, bootstrap, test, cleanup
- .github/workflows/e2e-brev.yaml: workflow with dispatch inputs, PR
  reporting, and keep_alive option for SSH debugging

Changes:
- vitest.config.ts: added e2e-brev project, excluded from default runs
- test/e2e/test-full-e2e.sh: pass --yes to destroy, shfmt formatting
- .gitignore: added specs/

Key design decisions:
- Ephemeral instances — no persistent state, no idle cost
- CPU-only (--cpu 4x16) — OpenShell gateway works without GPU
- Brev CLI pinned to v0.6.310 (v0.6.322 removed --cpu flag)
- Secrets passed via stdin to avoid exposure in process listings
- keep_alive defaults to true — instance stays up for SSH debugging
- describe.runIf skips tests when secrets unavailable (normal CI)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jyaunches added a commit to jyaunches/NemoClaw that referenced this pull request Mar 30, 2026
…nitization

Adds two new Brev E2E test suites targeting the vulnerabilities fixed by
PR NVIDIA#119 (Telegram bridge command injection) and PR NVIDIA#156 (credential
exposure in migration snapshots + blueprint digest bypass).

Test suites:
- test-telegram-injection.sh: 8 tests covering command substitution,
  backtick injection, quote-breakout, parameter expansion, process
  table leaks, and SANDBOX_NAME validation
- test-credential-sanitization.sh: 13 tests covering auth-profiles.json
  deletion, credential field stripping, non-credential preservation,
  symlink safety, blueprint digest verification, and pattern-based
  field detection

These tests are expected to FAIL on main (unfixed code) and PASS
once PR NVIDIA#119 and NVIDIA#156 are merged.

Refs: NVIDIA#118, NVIDIA#119, NVIDIA#156, NVIDIA#813
cv added a commit that referenced this pull request Mar 30, 2026
…al sanitization (#1092)

* test(security): add E2E tests for command injection and credential sanitization

Adds two new Brev E2E test suites targeting the vulnerabilities fixed by
PR #119 (Telegram bridge command injection) and PR #156 (credential
exposure in migration snapshots + blueprint digest bypass).

Test suites:
- test-telegram-injection.sh: 8 tests covering command substitution,
  backtick injection, quote-breakout, parameter expansion, process
  table leaks, and SANDBOX_NAME validation
- test-credential-sanitization.sh: 13 tests covering auth-profiles.json
  deletion, credential field stripping, non-credential preservation,
  symlink safety, blueprint digest verification, and pattern-based
  field detection

These tests are expected to FAIL on main (unfixed code) and PASS
once PR #119 and #156 are merged.

Refs: #118, #119, #156, #813

* ci: temporarily disable repo guard for fork testing

* ci: bump bootstrap timeout, skip vLLM on CPU E2E runs

- Add SKIP_VLLM=1 support to brev-setup.sh
- Use SKIP_VLLM=1 in brev-e2e.test.js bootstrap
- Bump beforeAll timeout to 30 min for CPU instances
- Bump workflow timeout to 60 min for 3 test suites

* ci: bump bootstrap timeout to 40 min for sandbox image build

* ci: bump Brev instance to 8x32 for faster Docker builds

* ci: add real-time progress streaming for E2E bootstrap and tests

- Stream SSH output to CI log during bootstrap (no more silence)
- Add timestamps to brev-setup.sh and setup.sh info/warn/fail messages
- Add background progress reporter during sandbox Docker build (heartbeat every 30s showing elapsed time, current Docker step, and last log line)
- Stream test script output to CI log via tee + capture for assertions
- Filter potential secrets from progress heartbeat output

* ci: use NemoClaw launchable for E2E bootstrap

Replace bare 'brev create' + brev-setup.sh with 'brev start' using the
OpenShell-Community launch-nemoclaw.sh setup script. This installs Docker,
OpenShell CLI, and Node.js via the launchable's proven path, then runs
'nemoclaw onboard --non-interactive' to build the sandbox (testing whether
this path is faster than our manual setup.sh).

Changes:
- Default CPU back to 4x16 (8x32 didn't help — bottleneck was I/O)
- Launchable path: brev start + setup-script URL, poll for completion,
  rsync PR branch, npm ci, nemoclaw onboard
- Legacy path preserved (USE_LAUNCHABLE=0)
- Timestamped logging throughout for timing comparison
- New use_launchable workflow input (default: true)

* fix: prevent openshell sandbox create from hanging in non-interactive mode

openshell sandbox create without a command defaults to opening an interactive
shell inside the sandbox. In CI (non-interactive SSH), this hangs forever —
the sandbox goes Ready but the command never returns. The [?2004h] terminal
escape codes in CI logs were bash enabling bracketed paste mode, waiting for
input.

Add --no-tty -- true so the command exits immediately after the sandbox is
created and Ready.

* fix: source nvm in non-interactive SSH for launchable path

The launchable setup script installs Node.js via nvm, which sets up PATH
in ~/.nvm/nvm.sh. Non-interactive SSH doesn't source .bashrc, so npm/node
commands fail with 'command not found'. Source nvm.sh before running npm
in the launchable path and runRemoteTest.

* fix: setup.sh respects NEMOCLAW_SANDBOX_NAME env var

setup.sh defaulted to 'nemoclaw' ignoring the NEMOCLAW_SANDBOX_NAME env
var set by the CI test harness (e2e-test). Now uses $1 > $NEMOCLAW_SANDBOX_NAME > nemoclaw.

* ci: bump full E2E test timeout to 15 min for install + sandbox build

* ci: don't run full E2E alongside security tests (it destroys the sandbox)

The full E2E test runs install.sh --non-interactive which destroys and
rebuilds the sandbox. When TEST_SUITE=all, this kills the sandbox that
beforeAll created, causing credential-sanitization and telegram-injection
to fail with 'sandbox not running'. Only run full E2E when TEST_SUITE=full.

* ci: pre-build base image locally when GHCR image unavailable

On forks or before the first base-image workflow run, the GHCR base image
(ghcr.io/nvidia/nemoclaw/sandbox-base:latest) doesn't exist. This causes
the Dockerfile's FROM to fail. Now setup.sh checks for the base image
and builds Dockerfile.base locally if needed.

On subsequent builds, Docker layer cache makes this near-instant.
Once the GHCR base image is available, this becomes a no-op (docker pull
succeeds and the local build is skipped).

* ci: install nemoclaw CLI after bootstrap in non-launchable path

brev-setup.sh creates the sandbox but doesn't install the host-side
nemoclaw CLI that test scripts need for 'nemoclaw <name> status'.
Add npm install + build + link step after bootstrap.

* fix: use npm_config_prefix for nemoclaw CLI install so it lands on PATH

* fix: npm link from repo root where bin.nemoclaw is defined

* fix(ci): register sandbox in nemoclaw registry after setup.sh bootstrap

setup.sh creates the sandbox via openshell directly but never writes
~/.nemoclaw/sandboxes.json. The security test scripts check
`nemoclaw <name> status` which reads the registry, causing all E2E
runs to fail with 'Sandbox e2e-test not running'.

Write the registry entry after nemoclaw CLI install so the test
scripts can find the sandbox.

* style: shfmt formatting fix in setup.sh

* fix(test): exclude policy presets from C7 secret pattern scan

C7 greps for 'npm_' inside the sandbox and false-positives on
nemoclaw-blueprint/policies/presets/npm.yaml which contains rule
names like 'npm_yarn', not actual credentials. Filter out /policies/
paths from all three pattern checks.

* docs(ci): add test suite descriptions to e2e-brev workflow header

Document what each test_suite option runs so maintainers can make an
informed choice from the Actions UI without reading the test scripts.

* ci: re-enable repo guard for e2e-brev workflow

Re-enable the github.repository check so the workflow only runs on
NVIDIA/NemoClaw, not on forks.

* fix(test): update setup-sandbox-name test for NEMOCLAW_SANDBOX_NAME env var

setup.sh now uses ${1:-${NEMOCLAW_SANDBOX_NAME:-nemoclaw}} instead of
${1:-nemoclaw}. Update the test to match and add coverage for the env
var fallback path.

* fix(lint): add shellcheck directives for injection test payloads and fix stdio type

* fix(lint): suppress SC2034 for status_output in credential sanitization test

* fix: address CodeRabbit review — timeout, pipefail, fail-closed probes, shell injection in test

- Bump e2e-brev workflow timeout-minutes from 60 to 90
- Add fail-fast when launchable setup exceeds 40-min wait
- Add pipefail to remote pipeline commands in runRemoteTest and npm ci
- Fix backtick shell injection in validateName test loop (use process.argv)
- Make sandbox_exec fail closed with __PROBE_FAILED__ sentinel
- Add probe failure checks in C6/C7 sandbox assertions

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Use this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions. enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. Platform: Brev Support for Brev deployment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants